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DISCREPANCY-TOLERANT  HIERARCHICAL 
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Donald  P.  Gaver 


1 .   Introduction 

Consider  a  collection  of  J  entities  called  "units"  that  inde- 
pendently generate  events  in  accordance  with  Poisson  processes, 
each  with  rate  parameter  A . .   We  are  in  possession  of  observations 
on  each  of  these  processes,  and  have  seen  s.  events  for  a  time 
exposure  of  t.  for  the  j — ,  j  =  1,...,J.   Possibly  also  available 
are  concommitant  observations  on  other  variables  x  that  may  in 
part  influence  (explain)  the  values  of  the  rates  A . .   The  problem 
is  to  use  these  observations  to  describe  the  nature  and  extent  of 
the  variation  between  individual  unit  rates,  and  on  this  basis  to 
predict  (a)  the  future  event  generation  behavior  of  individual  units 
under  observation,  as  well  as  (b)  the  overall  rate  variability  of 
existing  units,  and  hence  the  likely  rate  behavior  of  other,  similar, 
units  not  yet  under  observation.   The  object  of  this  paper  is  to 
propose  and  examine  statistical  methods  for  approaching  the  above 
problems.   The  approach  emphasized  is  to  treat  the  unknown  rates 
as  being  describable  in  part  as  coming  from  a  fixed  population  of 
possible  rates,  and  then  to  describe  or  assess  that  population  and 
its  implications  for  estimating  the  individualized  unit  rates.   The 
approach  is  called  hierarchical  because  each  rate  may  be  viewed 
as  a  realization  of  some  random  variable  associated  with  a  higher- 
level  superpopulation  of  rates;  such  models  are  also  called  random 
parameter  or  parametric  empirical  Bayes  models;  see  Morris  (1983) 


for  a  review.   There  have  been  a  variety  of  applications  of  similar 
models  in  many  fields.   However,  particular  emphasis  is  given  in 
this  paper  to  analyses  that  invoke  discrepancy  tolerant  superpopu- 
lation  parametric  representations:   ones  yielding  estimating  pro- 
cedures that  may  assist  in  identification  of  distinct  rate  groupings, 
existence  of  apparently  discrepant  or  outlying  rates,  etc.,  a 
better  understanding  of  which  could  suggest  desirable  improvements 
for  systems  so  identified.   Of  course  this  latter  step  may  well 
lead  in  practice  to  a  change  in  the  superpopulation,  and  to  need 
for  an  updated  new  analysis.   The  steps  suggested  resemble  the  cycle 
of  data  analysis  and  modelling,  model  diagnosis  by  residual  and 
sensitivity  analysis,  and  repeat,  often  adopted  in  enlightened 
regression  analyses;  cf.  Mosteller  and  Tukey  (1977),  Belsley,  Kuh , 
and  Welsch  (1980),  and  elsewhere.   Some  ideas  of  discrepancy-tolerant 
or  robust  Bayesian  analyses  have  been  described  by  Berger  (1980),  (1984), 
who  references  Albert  (1979)  for  as-yet  unpublished  studies.   Ideas 
expressed  in  the  paper  of  Box  (1980) ,  with  discussion,  are  quite 
relevant,  as  well. 

This  paper  proceeds  by  first  introducing  hierarchical  Poisson 
models.   Specification  of  useful  parametric  forms  for  the  superpopu- 
lation that  describes  between-unit  variability  is  the  next  topic; 
this  is  followed  by  a  discussion  of  explicit  adjusted  estimates  for 
individual  event  rates  in  terms  of  superpopulation  parameters. 
Finally,  some  procedures  are  described  for  obtaining  estimates  of 
the  superpopulation  parameters.   The  estimation  procedure  effec- 
tiveness is  assessed  by  simulation,  and  the  technology  is  applied 
to  certain  sets  of  real  data. 


2 .   The  Hierarchical  Poisson  Model 

Introduce  as  a  starting  point  the  Poisson  process  of  events 

for  item  j  with  (conditional)  event  rate  X.(x.,e.)  where 

x.  =  (x,  .  ,x„  .  , . . . ,x  . )  is  a  vector  of  explanatory  (regression) 
3  ■*■  J   ^  3  P  J 

variables,  and  e.  is  (the  realization  of)  a  random  variable  with 

fixed  density  f   (x;£);  the  latter  describes  an  infinite  super- 

£j  th 

population  with  the  parameters  vector  6.   The  value  of  the  j — 

(1  <  j  <_  J)  latent  variable  z.    is  here  taken  to  be  fixed  for  all 
time,  once  drawn  from  the  superpopulation .   It  is  thus  a  random 
individualization  of  the  failure  rate  of  item  j;  while  x.  values 
account  for  differences  in  environmental  factors.   Generally,  the 
superpopulation  distribution  accounts  for  the  rate  variations  be- 
tween items  or  individuals  after  adjustment  for  environmental  effects 
explained  by  x . .   Of  course  the  manner  in  which  the  explanatory 
variables  are  used  can  influence  the  form  of  the  apparent  super- 
population  distribution,  and  the  selection  of  items  for  study,  if 
influenced  by  the  rate  values,  can  bias  the  estimation  of  super- 
population  parameters;  see  Lehoczky  (1984). 

Contrast  the  above  model  types  to  those  in  which  A .  is  a  random 
function  of  time:   e.g.  in  discrete  time,  monthly  or  weekly  perhaps, 

the  rate  changes  in  accordance  with  A.   =  A. (x. ,£.,),  and 

3t  J    -J       3t 

{ e  .  ,  t  =  1,2,...}  is  a  collection  of  possibly  iid  random  variables, 
or  perhaps  a  more  general  stochastic  process  changing  in  discrete 
or  continuous  time;  these  last  are  called  "random  environment" 
models,  or  more  specifically  doubly  stochastic  Poisson  models,  see 
Cox  and  Lewis  (1966),  Cox  and  Isham  (1980),  Gaver  (1963),  Reynolds 
and  Savage  (1971),  and  Burridge  (1981).   For  instance,  if  the 


integrated  hazard   /   A !  dt '  =  A.(t)  is  the  realization  of  a 

0     D         D 
gamma  stochastic  process,  then  the  original  Poisson  process  becomes 

a  negative  binomial  process.   Other  interesting  models  would  allow 

for  changes  in  the  superpopulation  as  a  result  of  event  observation 

and  remedial  action.   Consideration  of  all  of  these  latter  is 

beyond  the  scope  of  this  study;  this  paper  confines  its  attention 

to  the  simplest  random  individualization  hierarchical  structures. 


3 .   Two  Classical  Hierarchical  Models:   The  Log-Normal  and  Gamma 
Super -populations 

In  order  to  parameterize  a  superpopulation  of  even  rates  in  a 

simple  fashion,  both  the  log-normal  and  the  gamma  distribution 

have  been  utilized  historically.   Here  is  a  brief  discussion,  with 

modifications  reserved  for  the  following  section. 

3.1   Log-Normal  Superpopulation:   The  L/N/P  (Log-Normal  Poisson)  Model 
Suppose  that  the  Poisson  rate  of  item  j  is  of  the  form 

A .   =   exp(e  )     j=l,2,...,J  (3.1) 

2 
where  e.  ~  N(y.,o  ).   Let  y  .  =  y  +  x.B_,  x.  being  a  vector  of  covari- 

ates  and  $_   a  vector  of  regression  coefficients,  whenever  the  regression 

term  is  present;  otherwise  y.  =  y.   This  paper  does  not  consider  the 

fitting  of  regression  coefficients.   Each  of  the  J  items  is  exposed 

for  known  time  t.,  with  s.  (=  0,1,2,...)  observed  events  recorded  therein 

3         3 

This  (log-normal)  model  is  expecially  popular  in  the  Proba- 
bilistic Risk  Assessment  (PRA)  of  nuclear  reactor  safety  and 
operating  systems,  see  the  Reactor  Safety  Study  (WASH  1400)  (1975), 
and  subsequent  numerous  reports  on  this  topic;  in  particular 
note  Kaplan  (1983)  .   Items  may  be  in-plant  equipments  such  as 
continuously  acting  pumps,  valves,  and  control  devices  that  are 
subject  to  failure  events;  other  events  of  concern  are  so-called 
initiating  events  such  as  loss  of  feedwater,  pipe  breaks,  loss  of 
offsite  power  and  other  challenges  to  the  integrity  of  a  nuclear-- 
or  other--plant * s  safe  and  productive  operation.   The  failure  or 
initiating  event  occurrences  may  initially  be  taken  to  be  time- 
homogeneous  Poisson  stochastic  processes,  with  rates  that  vary 
between  design  copies  in  accordance  with  environmental  influences 


(manufacturer,  geographical  location,  plant  type,  ( sub) system  type, 
etc.) .   A  full  analysis  endeavors  to  estimate  the  influence  of 
the  explanatory  environmental  variables  as  represented  by  a  re- 
gression function,  and  the  properties  of  the  random  individualiza- 
tions, £.,  so  as  to  provide  (i)  good  estimates  of  the  superpopulation 
center  (mean)  and  variability  (variance)  and  (ii)  good  estimates 
of  the  individual  item  failure  rates.   An  individual  rate  estimate 
constructed  from  its  own  experience  alone  may  be  subject  to  con- 
siderable random  error;  pooling  its  data  with  that  of  other  similar 
items  can  reduce  that  random  error  at  the  possible  expense  of 
adding  bias.   Our  methods  suggest  pooling  that  is  reasonably  based 
on  superpopulation  characteristics  as  estimated  under  (i),  assuming 
that  the  superpopulation  is  (log) normal;  in  a  later  section  esti- 
mates are  developed  that  depend  less  on  such  a  special  form,  i.e. 
that  pool  more  selectively. 

Given  the  model  (3.1)  a  revised,  pooled  or  shrunken,  estimate 
of  the  rate  associated  with  item  j  can  be  obtained  by  Bayes '  formula, 
perhaps  in  the  form 

-—  ( -L) 

2 v   o     -A (y)  t  .       s . 

A    =   E[A  |s  ,x  ]   =   K.    /   A(y)e  e       :(A(y))  D dy , 

J  ~"  J     J     J  J   _oo 

(3.2) 

where  A(y)  =  exp(y),  y.  =  p+3_x.  ,  and  K.  is  a  normalizing  con- 
stant.  The  integral  sometimes  may  be  well  approximated  by  use  of 
Laplace's  method,  see  Tierney  and  Kadane  (1984).   However,  a 
likelihood  approach  provides  quick  and  interpretable  results: 

choose  e .  =  In    A .  to  maximize 
J       : 


2  e  2    °       -A(c.)t   (A(e.)t.)  : 

L.(e  ;y,o  ,s.,t.)   =   — - e    J   J  1—j ,     (3.3) 

J   J        J   J  /2tto  sj' 


2 
the  likelihood  of  e.    given  ij,o   and  the  observations.   Differen- 
tiate the  log  of  (3.3)  and  equate  to  zero  to  obtain  the  non-linear 
estimating  equation 

e •  p  .-e  .    , 

e  J   =   (s.  +  -2TJ-)  —■  (3.4) 

a       j 

The  nature  of  the  estimate  is  appreciated  if  we  let  £.(1)  =  ln(s./t.) 
be  an  initial  solution  (putting  s./t.  =l/(3t.)if  s.  =  0),  and  pass 
through  one  Newton-Raphson  iteration  to  obtain 

2 

s  .  In  (s  ./t  •  )  +  y  ./o 

en   1  -J 3_JIJ (3.5) 

3  s  .  +  (l/o  ) 

Since  a  delta-method  approximation  to  Var [In ( s ./t . ) |A.  =  A.] 

-:   :  -3  1 

is  just  1/A.t.  z    1/s  ,  the  estimate  (3.5)  is  the  linearly  weighted 
shrinkage  of  the  log-rate  estimate  towards  the  assumed  super- 
population  mean,  p,  with  weights  the  reciprocals  of  the  within 
and  between  (superpopulation)  variances,  familiar  in  the  normal/ 
Gaussian  distribution  Bayesian-conjugate  prior  framework. 

The  above  estimates  require  values  for  superpopulation  param- 

2 
eters  u.  =  u  +  x.3  and  a  .   Once  these  are  at  hand,  values  can  be 
3        -J- 

2 
computed  for  e..   Desirably,  p.  and  o   can  be  estimated  from  data 

on  all  J  items'  histories.   Various  options  exist  for  this;  some 

are  proposed  and  explored  in  a  later  section. 


3.2   Gamma  Superpopulation 

Consider  the  convenient  (conjugate)  alternative  model 


A.   =   U.  exp  (3x.) 
where  now  all  is  as  before  except  that  U.  -  Gamma (6, a) ,  meaning 


that  the  density  is,  for  u  >  0, 


j- ,        r        \  -6u  ( 6u)      r  i  ^    r  \ 

f ( u ; 6 , a )   =   e     — .  ,   6  (3.6) 


This  model  has  a  long  history,  for  it  leads  to  the  negative 
binomial  marginal  distribution  of  event  counts: 


P(s=k|x}      =      Ele"*1?11   JiifljLflU: 


oo  J^ 

-A(u)t     [A(u)t]        -,       .       >  ,  ,  .,    -,. 

e  - — ^i  f(u;6,a)du  (3.7) 


r(k+a)     .  6  °    .      texP(i^       k 

k!T  (a)     {S  +t  exp  (6  x)  ;        [6  +  t  exp  (6  x)  J 


and  to  a  simple  linear  formula  for  the  Bayes  estimate  of  the 
individualized  rate 


s  .  exp  (  6_x  .  )  +  a  exp  (B_  x  .  ) 

A.   E^  E[A.  Is.  ,t.x.]   =   -3 1-;- r—r-x — 3~         (3-8) 

D        -D  '  D   :-J  t .  exp  (6  Xj  )  +6 


Since  for  the  gamma, 


my      =      E[U]       =      f   ;  o^      =      Var[U]       =      -^-    ,  (3.9) 


then 

Var[U]  '  a      Var[U]  ' 

the  above  expression  can  be  expressed  as 


exp  (3_  x  .  )  t  .    s  .       , 

1 5f— 1»  •  't1'    +  viHur(mu  exP  <£  *j  ) ' 

Aj   "  exP(jx.)t.   " (3-10) 

m^  +   Var [U] 


which  is  again  a  linear  shrinking  of  the  raw  point  estimate  (s./t) 
towards  the  appropriate  superpopulation  mean,  here  expressed  as 
m.  exp  (3_x.)  ;  the  weights  are  again  recognizable  as  within 
(mn/exp(  6_  x  .  )  t  . )  and  between  (Var  [  U]  )  variance  components. 

Note  that  the  random  log-linear  model  (3.1)  is  only  one 
suggestive  model  form.   Conceivably  random  individualization  should 
be  of  multiplicative  form:   In  A.  =  c.  Bx.,  rather  than  additive, 
so  that  covariate  influence  varies  from  item  to  item.   Other  possi- 
bilities also  exist,  but  regression  effects  are  not  considered 
further  in  this  paper. 


4 .   Discrepancy-Tolerant  Versions  of  the  Log-Normal  (L/N/P)  Model: 
The  Log  Sculptured-Normal  Poisson  (L/S-N/P) . 

It  is  natural  to  generalize  the  L/N/P  model  as  follows.   Again 

put 


but  now 


A .   =   exp(e . )  ,  (4 . l,a) 


e.   =   In  x .   =   p.  +a<j>(z.)   =   W-  +  az-i^(z.)  ,        (4.1,b) 

where  p.  =  y  +  $_  x  .  ,  z.  -  N/(0,1),  and  \p  (z  .)    =    <p  (z  .)  /z  .    is  a 
sculpturing  function .  See  Gaver  (1983)  ,  and  also  Hoaglin  (  1983) 
for  an  account  of  certain  such  functions  in  the  normal  (Gaussian) 
context,  attributable  to  Tukey,  see  (1974).   Call  <M  z . )  a 
sculptured  normal .   The  purpose  of  (4.1,b)  is  to  describe  stretch- 
tailed  distributions  of  log-rates,  i.e.  those  that  exhibit  quite 
widely  straggling  exotic  or  extreme  values,  or  outliers,  both  above 
and  below  the  normal-like  central  part.   Illustrations  of  well- 
behaved  distributions  of  rates,  as  contrasted  to  straggling  tailed 
distributions,  and  even  multi-modal  distributions,  appear  in  Figs. 
1,  2,  and  3.   Our  methods  are  aimed  at  dealing  with  the  effects 
of  Fig.  2,  and, to  an  extent,  Fig.  3. 

Here  are  some  convenient  sculpturings  of  the  normal. 

1/2 

(A)   4>(z;n)  =  i — r  /n-2  [exp  (z  ( ^J-) -1]  ,   a  pseudo-t ; 

|Z|  (n-l)Z 

an  explicitly-invertible  approximation  to  a  true  Student  t 
with  unit  variance  if  n  >  2.   See  Gaver  and  Kafadar  (1984) . 
This  representation  is  used  extensively  later  in  the  paper. 

10 


(B)  <J>(z,n)  =  (/(n-2)  /n)  t  ,  a  true  Student  t  with  n  deg.  fr., 
again  with  unit  variance  if  n  >  2.   Parenthetically,  use  of 
a  finite  variance  t  is  in  no  way  essential. 

2 

(C)  <J>(z;h)  =  (l-4h)  3/4zehz  ,  0  <  h  <  j   a  member  of  Tukey's  h- family; 

see  Hoaglin  (1983),  and  Tukey  (1974). 

There  are  other  interesting  and  convenient  such  forms  that  give 
promise  of  providing  multi-modal  parametric  representations;  see 

Cobb  (1983).   All  of  our  above  forms  yield  distributions  of 

2 
e  =  In  A  that  are  unimodal  and  symmetric,  have  variance  a  ,  and 

are  more  stretch-tailed  than  the  basic  unit  normal,  z.   As  will 
be  seen,  this  latter  qualitative  modification  has  a  beneficial 
effect  upon  the  rate  estimator,  reducing  the  tendency  for  indis- 
criminate shrinkage  of  apparently  very  discrepant  observations. 

4.1   Nearly  Explicit  Discrepancy-Tolerant  (Controlled  Shrinkage) 
Estimators  of  Rates. 

The  several  sculptured  normal  representations  just  presented 

yield  directly  interpretable  rate  estimates  by  way  of  approximate 

likelihood  maximization,  as  in  (3.3)  .   Results  for  options  (A)  , 

(B) ,  and  (C)  are  sketched. 

(A) :   Invert  the  pseudo-t  to  find 

,   2.„.      ,,    <$>2  s-(n-l)2/2(n-3/2) 
exp(-z  /2)   =   (1  +^r2") 

Now  the  log-likelihood  associated  with  observation  j  is 
proportional  to 


11 


2  <J)2 

VWa2'sj'V    =   -2T^wln(1+^)  "  A(Vfcj 


+  s  lnA((J).)  ,  (4.2) 


and  is  expressed  in  terms  of  the  pseudo  t  individualization 

<J>  .  =  <{>(z.).   Differentiation  then  yields  the  estimating 

equation  for  A.,  or  d>  .  :   if  e  .  =  u.  +  a*  .  .  then 
13  3  3  3 


Z  .  \1  .-£.  .  , 

A.   =   e  J   =   (s .  -  (_i_-l)w  (n))  _L  (4.3) 

-1  -1      a     -1       j 


where 


2 
/  \         (n-1)  1  ,,  ,, 

wj(n)   =   (n-3/2)  (n-2)  ^~ ~ (4-4) 

1  +  ( J— i)   -±* 

a     n-2 


(B) :   The  log-likelihood  is  very  similar  to  (4.2);  one  obtains 
only  a  slightly  different  weight: 


z  •  v  . -z  .  , 

A    e   e  :   =   (s   -  (  3      J)w . (n) )  p-  ,  (4.5) 

with 


wi(n)   =   (H^I)  ^ (4*6) 

e  .-y  •  o  i 

1  +  (-2—1)  ^ 
a     n-2 


(C) :   In  this  case  the  convenient  individualization  parameter  is 

2 
a  normal  deviate,  z.,  where  <J>  ( z  . )  =  z.  exp(hz.): 
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U+0(J)(Z.) 

A.   =   e       J    =   (S   -  z  w  (h)  )  -±-  (4#7) 

J  J     J-  J      "- . 


and 


] 


( 1  -  4h)  **' 
w  -(h)   =   -i± iiH «■  (4.8) 

hz 


(1 


+  2hz2)e   ^ 


It  can  be  seen  that  (4.3),  (4.5),  and  (4.7)  all  can  on  occasion 
have  two  real  solutions,  corresponding  to  the  possibility  of  two 
modes  in  the  likelihood,  or  Bayesian  posterior,  for  <J>  .  .   Strict 
adherence  to  likelihood  doctrine  would  force  computation  of  each 
solution  and  a  check  to  see  which  globally  maximizes  likelihood-- 
a  possible  but  tedious  task.   The  same  is  true  of  a  Laplace  method 
approach,  which  requires  modification  to  account  for  the  bimodality. 
Consequently  it  is  proposed  to  simply  modify  the  estimate-dependent 
weights  (4.4)  and  (4.6)  to  estimated  weights  that  utilize 

the  raw-data  estimate  ln(s./t.)  in  place  of  <b  .  ,  so  in  each  case 

3   3  3 

w.(n)   =   .   ,C(r!?  .  , = (4.9) 

j  ln(s./t.)-p   2 

1  +  ( V^ -1)  <;r7> 

o  n-2 

which  can  be  computed  once;  this  modification  leads  to  a  unique, 
approximately  Bayes,  solution  with  reasonable  properties.   Of 
course  both  likelihood  (or  posterior)  bimodality  or  the  occurrence 
of  a  small  weight  value  may  suggest  the  need  for  model  changes  or 
other  action. 

Notice  that  in  each  case  the  weight  w.(n)  modifies  the  resulting 
estimate  towards  discrepancy  tolerance:   a  value  of  the  individualization 
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parameter  far  from  the  appropriate  superpopulation  mean  jj  .  exerts 

a  small  influence  on  the  shrinkage  term,  so  the  quoted  estimate 

A.  tends  to  be  nearly  the  raw  rate  s./t..   Interestinqly ,  all  of 

these  extended  tail  models  induce  weights  in  excess  of  unity  on 

observations  with  log-raw-rates  close  enough  to  y.;  this  might  be 

called  over- shrinkage,  and  is  noticeable  in  Table  1,  p.  79  of  Berger 

(1984),  wherein  a  normal  likelihood  is  combined  with  a  Cauchy  (one 

d.f.  t)  prior.   This  very  same  effect  has  been  pointed  out  by  Tukey 

in  (1974) ,  p.  132. 

An  interpretable  approximation  to  these  log  rate  estimates  is 

obtained  by  the  following  linearization:   in  (4.3)  or  (4.6)  start 

with  e.(l)  =  In (s./t-)  and  turn  the  Newton-Raphson  crank  once,  but 
I)  J       3 

evaluate  w.(n)  at  £.(1),  i.e.  utilize  (4.9).   Then 

(s . )ln(s ./t  .  )+(y ./a2)w.  (n) 

In  A.(l)   e   e.(l)   =   — 3 2 2 J 3 .    (4.10) 

3  3  (s.)  +  (1/a  )w.(n) 

The  term  (s.)  is  the  delta-method  estimate  of  (Var [In (s ./t . ) ] )   , 
so  the  estimate  quoted  is  seen  to  be  nearly  a  linear  combination 
of  the  raw  rate  estimate  and  the  individualized  mean,  with  the 
shrinkage  towards  the  latter  influenced  by  the  discrepancy 
( In (s  ./t . ) -y . ) /a  as  reflected  in  the  weights  w.(n);  small  dis- 
crepancies tend  to  shrink  the  estimate  towards  the  mean,  while 
large  discrepancies  are  tolerated,  i.e.  left  largely  without 
shrinkage  so  that  the  quoted  estimate  is  nearly  In  A .  -  log (s./t.). 
The  parameter  n  or  h  in  the  superpopulation  models  is  not  estimated 
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at  this  point,  but  treated  as  a  tuning  parameter:   the  smaller  n 
(n  >  2) ,  or  larger  h  (h  <  0.25)  the  greater  the  effect  of  the 
weights  upon  discrepant  observations.   In  practice,  n  =  4  has 
given  satisfactory  performance,  as  will  be  suggested  by  simulations 
and  trial  data  analyses.   For  a  similar  analysis  procedure  in  the 
more  classical  robustness  context  see  the  biweights  for  estimation 
of  a  distributional  center,  Mosteller  and  Tukey  (1977),  p.  353 
ff.;  our  weight  w.(n)  is  essentially  an  influence  function,  with 
degree  of  observational  influence  adjusted  by  choice  of  n  or  h , 
corresponding  to  the  parameter  c  in  biweight  technology. 
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5 .   Superpopulation  Parameter  Estimation 

In  order  to  provide  suitable  pooled,  shrunken,  estimates  of  an 
individual  item  rate  it  is  necessary  to  invoke  estimates  of  super- 
population  parameters.  Unfortunately,  the  superpopulation  variate 
values  are  not  observed  directly,  but  are  contaminated  by  "Poisson 
noise;"  this  complicates  the  task  of  parameter  estimation. 

Natural  approaches  to  the  estimation  problem  are  through 
moment  matching,  maximum  likelihood  or  Bayesian  approaches.   We 
suggest  variations  on  these  themes  that  require  different  degrees 
of  computer-intensive  effort,  and  are  of  differing  effectiveness. 
The  methods  advanced  for  consideration  have  been  compared  by  simula- 
tions.  The  limited  histories  typical  in  various  fields,  e.g. 
in  reliability  and  survival  studies,  and  in  nuclear  plant  risk, 
do  not  encourage  faith  in  the  validity  of  asymptotic  error  analyses 
without  such  corroboration. 


5.1   Likelihood  Estimation  for  the  Log  Sculptured-Normal  Poisson 
(L/S-N/P)  Model 

Suppose  that  a  time  history  of  length  t.  results  in  s.  events 

for  item  j  (j  =  1,2,..., J).   The  data  is  to  be  analyzed  with 

reference  to  the  general  L/S-N/P  model  of  (4.1) ,  but  for  the 

present  y.  =  y,  a  constant;  regression  will  be  discussed  later. 

Method  1:   Likelihood  By  Gauss-Hermite  Integration. 

2 
The  likelihood  of  the  parameter  y  and  o   given  the  data  and 


the  L/S-N/P  model  can  be  expressed  as 


i      -  , 


L(y,a;s,t)   =    n   L.(y,a,s.,t.)  (5.1, a) 

j=l   :        J       J 
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where 


12  s. 

00    2Z    -A(z)  t  .  (A(z)  t  .) 


L.(y,o  ;s.,t.)   =    /   e      3    -J dz  ,       (5.1,b) 


and 


log  A(z)   =   p  +  a<t>(z)   =   w  +  oziJ;(z)  (5.1,c) 

Throughout  what  follows  the  sculpturing  function,  \\>   or  equiva- 

lently  4>,  is  assumed  given. 

The  integrals  (5.1,b)  cannot  be  carried  out  analytically. 

_172 

Owing  to  the  appearance  of  e  z   ,  the  use  of  Gauss-Hermite 
numerical  integration  is  suggested.   In  the  notation  of  Abramowitz 
and  Stegun  (1968) , 


2  °°      -x2 

/2tt    L.(y,a    ;s.,t.)       =  /      e~         f.(x)dx      ^      I   vs^f.U.) 


where    z    =    /2x   and 


-A(/2x    )t.  s. 

f(x.)       =      e  X      3((A(/2x.)t.)     J    -iy    ;  (5.2) 

: 


the  x.#  w.  values  are  taken  from  tables.   A  grid  search  among 
11 
2 
u,  a   values  then  reveals  the  approximate  maximum  likelihood 
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solutions  from  (5.1).   It  has  been  discovered  that  care  is  required 

2 
in  the  choice  of  the  \i ,  o   start  when  analyzing  a  few  short  data 

histories.   Also,  the  straightforward  integration  is  numerically 

ill-conditioned. 

Method  2:   Quadratic  Approximation  (Laplace's  Method  Approach) . 

An  appealing  approximation  to  the  integral  (5.1,b)  is  obtained 

by  expressing  of  the  j —  likelihood  component  as 


L.  (y,a  ;s.,t.) 


-  e'2z2   -Qpj(z) 

'  ~7=~  e 

/2tt 


dz 


(5.4) 


where,  except  for  irrelevant  parameter-free  constants, 


Qpj(z)   =   A(z) t,  -  s.  lnX(z)  , 


(5.5) 


which  has  a  qualitatively  bowl-shaped  appearance.   Hence 
quadratically  approximate  Qp(z)  as  follows: 


Qpj(z) 


QpjOj)  +  k^V^'V  ' 


(5.6) 


6 .    being    the    solution   of 


Qp(z)       =      X'tzJtj 


-    s 


A'  (z) 
j      A  (z) 


=      0     , 


(5.7) 


so 


A(0j)       -      s./t.     , 


(5. 8, a) 
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or 


ln(s./t.)  -  p 
^Oj)   =   -L-^J ,  (5.8,b) 


namely 


,   ln(s./t.)  -  y 
0.   =   <J>    ( -L-^ )  .  (5.8,c) 


2 
Clearly  Q  . (G.)  is  independent  of  p  and  o   and  can  be  ignored, 


while 


QpjOj)   =  s.o2[<$>'  (Q.))2    ,  (5.9) 


and  so  the  approximate  likelihood  is  of  the  form 


-      2         -  1        Ve"^(8J)(-8J)2    

L.(y,a  )   =    /   -^-  e     - — dz/Q"(G.) 

3  -°°    /2tt  /2tt  F:   : 


=   exp[-iG2/(l  +  (Q"  (6.))  1)]  X  (5.10) 


v/l  +  (Q£.(6.)) 


Specific  adoption  of  sculpturing  option  (A) ,  the  pseudo-t,  pro- 
vides that 


fn  ,  v2  <J>(6.)  9  , 

|BZ1^_  log  [IM—X-)2-^]  sign  ♦(e.)        (5.11) 


and 

19 


QjjOj)   =  sja2[(J>,(6j)]2 


f  <t>2(0  .)/(n-2)   2 


_  ^2Q2  r  (n-2)  (n-3/2)  ,    v    j  ' 

=  s.a  0.1 - — ( J.       c )J      (5.12) 

3     3        (n-ir  ^V 


each  of  which  is  evaluated  from  (5.8,b)  .   Note  that  if  n  ■*  °°  then 

the  normal  superpopulation  case  is  obtained,  and  the  present 

2 
approximation  treats  log(s./t.)  as  approximately  N(y,o  +l/s-). 

Method  3:   Quadratic  Approximation  to  the  Log-Likelihood, 
Augmented  by  Gauss-Hermite  Integration 

The  previous  method  blithely  approximates  the  Poisson  log- 
likelihood  by  a  quadratic  in  order  to  achieve  convenient  and 
interpretable  results.   Consider  next  a  more  careful  approach  that 
combines  Methods  1  and  2.   To  do  so,  express  the  entire  exponent 
in  the  integrand  of  ( 5 . 4) --essentially  the  negative  log-likelihood-- 
as 


Q.(z)   =  \z2    +    Q  (z)   =   ^-z2  +  t.A(z)  -  s.  lnA(z)  .       (5.18) 

Let  9  .  be  a  solution  of  Q!(z)  =  0.   NOte  that  this  solution 
J  3 

will  not  be  as  explicit  as  before,  and  so  an  approximate  value 
may  sometimes  be  most  conveniently  used;  call  it  6 . .   Now  expand 
up  to  quadratic  terms  and  let 

Q.(z)       =      q    (z)     +    Q(6.)     +     (z-e.)Q!(6.)     +    \ ( z-6 . ) 2Q" ( 6    )  (5.19) 

where 


q.(z)        =       Q    (z)    -Q    (6.)    -  (z-0    )Q!  (0  .) -Uz-Q    )  ZQ"  (Q    )     ;  (5.20 

J  J  JJ  JJJ^JJJ 
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regard  q.(z)  as  the  error  in  the  quadratic  approximation.   Next 
complete  the  square  in  the  quadratic  terms  of  (5.19)  and  put 


x   =   — [(z-6  )/Q"(B  .)  +  Q'(9.)//Q"(eJ]  (5.21) 

/2      :       J  3D 


and  hence 

•2  x      Q'(ei} 
Z(x)   =  -     xJ,  +  9  (5.22) 

SQ"(Q.)         Q  (8j)     3 
It  thus  follows  that 

[Q'(6  )]2 

Q  (z(x))   =   q.(z(x)) 1 +  xZ  ,  (5.23) 

3  J  2Q"(6.) 

and  that  the  likelihood  assumes  the  form 


2 


2  ~    [Q!  (8.)]2      , 

L  (y,0  )   =   K  exp(-Q  (9  )+ =L^ )  I  (y,o")   ,      (5.24) 

J  J       J   J   2QV  (9.)     /Q'Mfi.)   : 

:   D        3    °D 


where 


2       ,°°  _v2  -q. (z(x)) 

I.  (p,a  )   =     /ee-1        dx  ,  (5.25) 

■J  _oo 


2 
and  K.  is  a  constant  independent  of  parameters  u    and  a    .   The 

idea  is  that  subtraction  of  the  quadratic  approximation  from 

Q.(z)  should  leave  a  relatively  minor  correction,  I.,  to  be 
3  3 
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evaluated  by  Gauss-Hermite  integration.   Finally,  then,  the  log- 
likelihood  is  of  the  form 


2       1   J      *  [Q* (6.) ]     , 

i(M,o    )      =  4  I    t-QO.)  +  J ilnQ"(9.)  +ln  I.}  ,     (5.26) 

J  j=l      J      2Q"  (6.)  -1        J 


2 
which  can  be  examined  for  maxima  over  \i    and  o   >  0.   NOte  that 

if  the  equation  Q!(z)  is  in  fact  solved  precisely,  then  Q'(0.)  =  0, 

and  the  correction  term  [Q'(9.)]  /2Q"(6.)  may  be  omitted. 

An  important  part  of  the  computation  involves  finding  6 . ,  an 

approximate  solution  to  the  equation 

A'  (z) 


Q!  (z)   =   0   =   z  +  A ' (z) t  .  -  s 


j  j     j  Mz) 

Parametrization  by  the  sculptured  form  In  A(z)  =  p  +  a<J>(z)  leads 
to  the  equation 


0   =   z  +  cr(t.A(z)  -  s.)<t»'(z)  (5. 27, a! 


or 


0   =   z  +  o(t.eM+a<f>(z)  -  s.)0'(z)  (5.27,b) 


At  this  point  it  becomes  desirable  to  introduce  a  specific, 
tractable,  parametric  form  for  cj)(z)  .   Consult  (4.3)  :   option  (A), 
a  pseudo-t,  is  handy,  and  will  be  adopted.   Here  are  some  details. 
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Option    (A)     (pseudo-t) . 
For    this    representation, 


2 
Q'(z)       =      ctBzt1  +^    /a)  (5.28) 


2 
where  a  =  n-2,  3  =  (n-3/2) / (n-1)   from  Gaver  and  Kafadar  (1984) 

requires  solving  the  following  equation  for  <f> : 


e»+0*      =       (s. (n-D2<t>/o 1  (5>29) 


j  (n-2)  (n-3/2)(l  +<$>2/(n-2))  fcj 


this  has  arisen  earlier  as  (4.3)  in  the  context  of  finding  an 
individualized  estimate  of  A . . 

To  simplify  calculations,  one  may  initially  take  as  an 
approximate  solution 


ln(s./t.)  -  y 
<$,  .       =      ^ 3 .  (5.30) 

3  o 


Reference  to  (4.3)  then  shows  that 


i.   =  (\    I-  In  (1  +&-)  )  sign(<J.)  (5.31) 


Furthermore,  for  this  determination  of  0 . , 

Q(8.)   =   y(e-)2  (5.32) 


Q'tOj)   =   0,  (5.33 
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9      -   l+(4>.)  /a 
Q"(6.)   =   l+os.(a6e.[ 1 ])  ,  (5.34) 

J 

2 
and  now  (5.26)  can  be  evaluated  for  any  \i ,  o    .   The  resulting 

J  2 

approximate  log  likelihood,   £   log  L.(p,o  ),  with  L.  as  in  (5.24) 

j=l      3  ^ 

and  above  can  be  explored  for  maxima  by  numerical  search.   Alterna- 

2 
tively,  numerical  integration  of  exp(-£(p,a  )),  see  (5.26),  with  suitab 

2 

(non-informative)  priors  for  p  and  o   results  in  Bayes  estimates; 

this  ambitious  numerical  step  has  not  yet  been  carried  out. 

A  more  detailed  investigation  may  begin  by  precise  determination 
of  solutions  to  (5.29).   Graphical  analysis  quickly  reveals  the 

possibility  of  three  real  solutions,  two  of  which  identify  local 

2 

maxima.   It  can  be  seen  that,  given  jj  and  o  ,  the  solution  in 

terms  of  e  =  {$-\\) /o    of  the  equation  (4.5)  modified  with  the 
weight  (4.9)  is  a  reasonable  choice  for  cf>  .  ,  which  is  then  converted 
to  6 •  by  (4.3).   Now 

Q'(e.)  =  e.  +  (x(e.)t  -s.)<t>'  (e.)  ,         (5.35) 

and 

Q"(6.)   =   1  +  o2t.  ((J)' (6.)  )2  +  o(A(8.)t.  -s.)J"(6.)        (5.36) 


where 


<t>'(e.)   =   aB  — l —  (1  +  4>2(0.)/a)  (5.37) 

3       (Me-j)        D 


and 
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<t>"  (6.)   =   — = (aSd  +4>  (6  .)/a)  (1  +aB9^)  -  ( <J> '  (6  .)  )  }  .     (5.38) 


The  Q .-derivatives  are  introduced  into  (5.24),  and  the  resulting 

2 
expression  is  searched  for  maximizing  values  of  p  and  o  . 
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6 .   Simulation  Testing 

The  procedures  described  for  carrhing  out  the  estimation  of 

2 

superpopulation  center  (y)  and  variance  (o  )#  and  the  individualized, 

or  selectively  shrunken,  estimates  of  rates,  e.g.  (4.3),  have 
been  appraised  by  simulations.   In  summary,  the  procedures 

described  deliver  satisfactory  results;  empirical  distributions  of 

2 
estimates  of  p  and  o   appear  to  center  close  to  the  values  input, 

and  the  average  distance  of  selectively  shrunken  individualized 

rate  estimates  from  the  true  (distance  being  mean  squared  error, 

median  absolute  deviation)  often  improve  upon  obvious  competitors. 

Of  course  these  statements  apply  only  to  the  range  of  parameter 

values  studied,  which  illustrate  those  encountered  in  certain 

nuclear  power  system  probabilistic  risk  assessments.   A  brief 

selection  of  many  simulations  appears  in  the  following  tables  and 

figures. 


6.1   Simulation  Design. 

A  simulation  requires  specification  of  the  superpopulation  form 
and  parameters,  the  sample  size,  the  exposure  times  t.  (j  =  1,2,..., J) 
and  the  sculpturing  function  $ ( • )  used  in  rate  production.   Note 
that  the  latter  function  need  not--and  here  will  not — be  the  same 
as  that  used  to  construct  individualized  rate  estimates. 

Specification  of  the  present  simulation  follows, 
(a)   Superpopulation  form  is  a  sculptured  normal  form  (C) ,  the 
Tukey  h  family,  from  which  actual  or  "true"  rates  are  easily  con- 
structed :   X  .  .      =   exp  [u  +  od)  (  z  ,  . .  )  ]  ,  where  z  ,  . ,  being  the  j —  ordered  o 
(3)  (3)  (3) 

2 
increasing  magnitude)  unit  normal,  where  <$>(z...)    =  z,.,exp(hz,  .,)  . 
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Simulation  of  a  sample  of  J  begins  by  obtaining  J  unit  normal 
deviates,  ordering  them,  and  computing  X...,    j  =  1,2,..., J. 

(b)  Having  X...,    and  having  specified  t.,  J  realizations  of 

independent  Poisson  random  variables,  or  counts,  are  generated; 

s.  corresponds  to  mean  A..,t.. 
3  (D)  3 

(c)  The  simulated  observations  (s.,t.;  j  =  1,2,..., J)  are  analyzed 

according  to  the  L/S-N/P  model  by  Method  3  to  obtain  point  esti- 

2 

mates  of  superpopulation  parameters  p  and  o  .   The  pseudo-t 

option  (A)  is  used  in  the  likelihood;  parameter  n  has  been  treated 
as  a  tuner,  either  specified  as  low,  e.g.  n  =  4,  yielding  highly 
restricted  or  selective  shrinkage,  or  as  high,  e.g.  n  =  50,  corres- 
ponding nearly  to  the  more  conventional  log  normal  model.   A 

2 

numerical  search  procedure  has  been  utilized  to  locate  y  and  a 

values . 

(d)  Individualized  estimates  A...  are  computed  by  these  options, 

2 
using  estimated  y  and  o   from  (c) : 


MLE:   A  ,  .  .  =  s^/t-:  (6.  1) 

(: )     J  3 


SSP:   A,.x  =  solution  of  (4.3)  using  w.(n)  =  1         (6.2) 
(j)  3 


RSP:   A,.x  =  solution  of  (4.3)  using  the  w . (n)  of  (4.9); 
(j)  D 

(6.3) 


The  abbreviations  are,  respectively,  for  maximum  likelihood, 
simple  shrinkage  Poisson,  and  restricted  shrinkage  Poisson. 

o 
(e)   Each  case  (J,p,o  ,  h , (t . ) ,n)  combination   is  independently 

simulated  200  times. 
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2 
(e)   Each  case,  i.e.  (J,y,a  ,h,t.,n)  combination,  is  independently  simu- 
lated 200  times  for  Table  6.1,  and  100  times  for  Table  6.2.   The 
mean-squared  errors  of  estimators  (6.1),  (6.2),  (6.3)  are  sum- 
marized— summary  figures  (sample  mean,  median,  standard  deviation, 
median  absolute  deviation,  and  mean  squared  error)  for  errors  of 
estimate  of  the  smallest  true  rate  A .,,   ,  the  median  true  rate 

A    1  ,  and  the  largest,  A,  >:   A,.>  -A,...   Choice  of  these  three 

(~2— )  D 

values  for  true  A  is  enough  to  reveal  the  different  behaviors  of 

the  candidate  estimators:   in  general  SSP  and  RSP  both  greatly 
improve  upon  simple  MLE  for  centrist  (median)  A  values  by  borrowing 
strength,  while  RSP  improves  upon  SSP  at  true  A  extremes  by  refusing 
to  overshrink . 

Tables  6.1  and  6.2  summarize  illustrative  sets  of  simulation  re- 
sults.  Note  that  the  estimates  of  the  superpopulation  mean,  u,  appear 

2 
close  to  being  unbiased,  while  those  of  the  variance  a   appear  to  be 

consistently  biased  downwards  in  Table  6.1  (J  =  15  ),  and  about 
right  in  Table  6.2  (J  =  45) .   Standard  error  of  estimate  (square- 
roots  of  the  variances  of  the  empirical  distributions  of  the  correspond- 
ing parameters)  are,  not  surprisingly,  substantial;  as  is  sensible, 
they  decrease  as  J  increases.   Nevertheless,  comparison  of  the  simu- 
lated MSE  figures  for  the  various  estimators  suggest  that  RSP, 
especially  for  n  =  4,  has  advantages:   for  the  smallest  rates,  A.,>, 
and  the  largest,  \(,       ,    RSP's  mse  more  nearly  resembles  the  MLE 
mse  performance  than  does  the  more  heavily  shrunken  SSP,  particularly 
for  n  =  50  and  75  which  imitate  the  action  of  a  log-normal  analysis; 

for  middle  values,  A/0>  and  A,0-,>,  both  RSP  and  SSP  estimates 

(  o  )        (  z  J  ; 

shrink  moderately,  all  improving  substantially  upon  the  MLE. 
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Table    6.1 

Selected   Mean    Squared   Error   Comparisons 
and    Estimated   Superpopulation   Parameters 


J    =    15,    h    =    0.15,    200    Simulations 


Mean  Squared  Error 


True 
Values 

Estimated 

(n  =  4)  :   p  = 

-0.97(0.41) 

Estimator 

A  ,, .  (small) 

A(8 

(median) 

A(15)(large) 

RSP 

0.016 

0.019 

0.33 

"2 

o  = 

0.17(0.15) 

SSP 

0.020 

0.020 

0.34 

p  =  -1.0 

o2  =  0.25 

MLE 

0.007 

0.030 

0.32 

(n  =  50)  :   p  = 

-0.98(0.45) 

RSP 

0.019 

0.020 

0.35 

~2 

o  = 

0.18(0.15) 

SSP 

0.019 

0.020 

0.35 

(n  =  4) :        p  =  -1.93(0.50)  RSP 

a2  =  0.18(0.17)  SSP 


0.0050 
0.0060 


0.0060 
0.0058 


0.28 
0.30 


P  =  ■ 

2 

o     = 


■2.0 
0.25 


(n  =  50) :      p  =  -1.93(0.52) 
o2  =  0.20(0.18) 


MLE 

0.0026 

0.014 

0.27 

RSP 

0.0053 

0.0057 

0.30 

SSP 

0.0054 

0.0057 

0.30 
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Table    6.2 

Selected  Mean    Squared   Error   Comparisons 
and   Estimated   Superpopulation   Parameters 


j    =    45      h    =    0.10,    t.    =    5;    100    Simulations 


True  Estimator     A.,,  (small)    A, 97>  (median)    AM[->  (large) 

Values         Estimated  U)  KZ6)  K^] 


p  =  0 .  50 
o2=0.35 


(n=4)  :  y  =0.50(0.25)   RSP        0.030        0.067        2.65 
o2  =0.41(0.29)   SSP        0.050       0.067        2.75 


MLE        0.011        0.13         2.61 


(n=75)  :p  =-0.56(0.30)   RSP        0.042       0.069        2.68 
a2  =0.44(0.28)   SSP        0.044       0.069        2.71 
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7 .   Analysis  of  Data 

Our  methodology  will  now  be  applied  to  several  sets  of  observa- 
tional failure  or  event  rate  data.   In  each  case  estimated  super- 
's) 
population  parameters  y  and  o^  are  quoted,  as  are  the  unpooled 

maximum  likelihood  individual  rate  estimates  (MLE) ,  the  simple 
linearized  shrunken  (Bayes)  estimates  (SSP) ,  and  the  discrepancy- 
tolerant  restricted  shrinkage  estimates  (RSP) ,  along  with  the 
weights  associated  with  each  of  the  latter.   It  appears  that  the 
results  so  obtained  contrast  interestingly,  with  the  RSP  behaving 
in  the  discrepancy-tolerant  manner  anticipated,  and  with  small  weights 
influencing  this  behavior,  especially  in  data  sets  for  which  J  is 
substantial . 

7.1   Ship  System  Failure  Rates 

The  numbers  of  failures  during  one  year  experienced  by  each  of 
J  =  254  individual  systems  aboard  a  Navy  ship  have  been  furnished 
by  Dr.  R.  Coile.   It  is  provisionally  assumed  that  all  systems  are 
exposed  to  failure  throughout  time,  and  that  the  failure  process  is 
nearly  Poisson;  neither  assumption  can  be  checked,  but  the  analysis 
is  of  interest.   The  analytical  results  are  in  Table  7.1. 

The  heavy  preponderance  of  zero  and  one-failure  systems  is 
recognized  by  both  SSP  and  RSP,  which  nearly  agree:   both  shrink 
in  the  same  direction  at  this  level.   However,  SSP  continues  to 
shrink  towards  low  rate  values  even  for  the  two  units  with  relatively 
high  observed  rates,  while  RSP  is  far  more  discrepancy-tolerant,  as 
dictated  by  the  corresponding  low  weights.   It  is  interesting  that 


our  procedures  estimate,  on  the  basis  of  a  log-normal  superpopulation , 

1 
2 
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1  "2 
an  expected  failure  rate  of  exp(u  +75-0  )  =0.54,  close  to  the  pooled 


TABLE  7  .  1 
SHIP  SYSTEM  FAILURE  RATES 


y  =  -1.34,   o2  =  1.46,   J  =  254 


Number  of  SSP  RSP 

Units  Failures  MLE  (n  =  75)  (n  =  4)  WT. 

178  0  0.00  0.20  0.22  1.8 

48  1  1.00  0.52  0.50  1.1 

16  2  2.00  1.0  1.2  0.75 

3  3  3.00  1.7  2.2  0.59 

6  4  4.00  2.5  3.1  0.51 

1  5  5.00  3.3  4.2  0.45 

1  9  9.00  6.8  8.2  0.34 

1  11  11.00  8.6  10.2  0.31 
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observed  rate  of  0.50.   The  median  failure  rate  is  exp(p)  =  0.26, 
and  since  over  one-half  of  the  observations  are  zeros,  it  is  also 
encouraging  that  the  estimated  median  rate  is  near  the  estimated 
rate  for  systems  with  zero  failures.   Finally,  a  calculation  of 
the  probability  of  zero  events  for  a  gamma  superpopulation  moment- 
matched  to  the  log-normal  with  the  observed  parameters  yields 
0.75,  which  may  be  compared  to  the  observed  fraction  178/254  =  0.70. 
Although  the  simple  calculation  is  perhaps  crude,  the  agreement 
is  gratifying. 

7.2   Loss  of  Feedwater  Flow 

Table  7.2  presents  a  set  of  data  referring  to  the  rates  of 
loss  of  feedwater  flow  for  a  collection  of  nuclear  power  genera- 
tion systems;  see  Kaplan  (1983).   The  corresponding  SSP ,  RSP 
derived  rates  are  included.   Once  again,  the  units  with  very  small 
(<  0.5)  weights,  namely  Systems  1,  3,  7,  18,  and  19,  all  display  marked 
differences  between  RSP  and  SSP,  with  the  resulting  SSP  estimates  exhibit- 
ing shrinkage  upwards  far  more  extensive  than  those  of  the  corresponding  RSP 
Systems  11  and  23  have  the  highest  observed  rates,  both  have  about 
the  same  times  of  exposures  and  nearly  the  same  computed  weights, 
and  both  RSP  estimates  are  slightly  less  shrunken,  thus  closer  to 
the  MLE,  than  are  those  from  SSP.   The  estimated  median  rate, 
calculated  on  the  basis  of  a  log-normal  superpopulation,  is 
exp(p)  =  2.56;  the  fraction  of  MLE  rates  equal  to  or  exceeding 
2.6  is  16/30  =  0.53;  the  corresponding  fractions  of  SSP  and  RSP 
rates  is  15/30  =  0.5,  in  good  agreement. 
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Table    7.2 
Loss    of    Feedwater    Flow    Rates 
y    =    0.94      o      =    0.31;         A=    exp(p  +a    /2)    =    2.99 


System 

s(j) 

t(j) 

MLE 

SSP  (n  = 75) 

RSP  (n  =  4) 

WT. 

1 

4 

15 

0.27 

0.59 

0.35 

0.19 

2 

40 

12 

3.3 

3.3 

3.2 

1.6 

3 

0 

8 

0.041 

0.59 

0.087 

0.063 

4 

10 

8 

1.3 

1.5 

1.5 

0.98 

5 

14 

6 

2.3 

2.4 

2.4 

1.8 

6 

31 

5 

6.2 

5.7 

5.8 

0.30 

7 

2 

5 

0.4 

1.0 

0.64 

0.27 

n 
O 

4 

4 

1.0 

1.5 

1.4 

0.74 

9 

13 

4 

3.3 

3.1 

3.0 

1.7 

10 

4 

3 

1.3 

1.7 

1.8 

1.1 

11 

27 

4 

6.8 

6.1 

6.2 

0.71 

12 

14 

4 

3.5 

3.3 

3.2 

1.6 

13 

10 

4 

2.5 

2.5 

2.5 

1.8 

14 

7 

2 

3.5 

3.2 

3.1 

1.6 

15 

4 

3 

1.3 

1.7 

1.8 

1.1 

16 

3 

3 

1.0 

1.5 

1.5 

0.74 

17 

11 

2 

5.5 

4.6 

4.6 

0.92 

18 

1 

2 

0.5 

1.4 

1.0 

0.34 

19 

0 

2 

0.17 

1.2 

0.41 

0.14 

20 

3 

1 

3.0 

2.3 

2.7 

1.7 

21 

5 

1 

5.0 

3.8 

3.7 

1.0 

22 

6 

1 

6.0 

4.3 

4.5 

0.83 

23 

35 

5 

7.0 

6.4 

6.6 

0.68 

24 

12 

3 

4.0 

3.6 

3.5 

1.4 

25 

1 

1 

1.0 

1.9 

1.8 

0.74 

26 

10 

3 

3.3 

3.1 

3.0 

1.6 

27 

5 

2 

2.5 

2.5 

2.5 

1.8 

28 

16 

4 

4.0 

3.7 

3.6 

1.4 

29 

14 

3 

4.7 

4.1 

4.1 

1.1 

30 

58 

11 

5.3 

5.1 

5.1 

0.98 
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7.3   Globe  Valve  Leak  Failures. 

These  data  pertain  to  nuclear  power  plant  globe  valve  leak- 
mode  failures,  categorized  according  to  operator  type,  the  source 
being  NPRDS  data;  see  Hill,  et  al  (1984),  p.  9,  where  the  results 
of  a  gamma  superpopulation  analysis  are  presented  and  discussed. 
The  data  are  categorized  by  operator  type,  and  the  between-category 
variability  is  described  by  a  superpopulation.   A  more  appropriate 
analysis  would  presumably  be  (known)  category  by  category,  with 
between-system  variability  within  each  known  category  described 
by  a  superpopulation.   Table  7.3  describes  the  data  and  estimates, 
but  also  includes  the  gamma  estimates  of  Hill  e_t  al . 

Table  7.3 
Globe  Valve  Leak  Failure  Rates 

y(4)  =  0.040,  o2(4)  =  1.1,   u(75)  =  0.18,   a2(75)  =  1.0 


Category 

s(j) 

t(j) 

MLE 

Gamma 
PEB 

SSP 
(n  =75) 

RSP 
(n  =4) 

Wl  . 

1 

31 

236.9 

0.131 

0.134 

0.138 

0.136 

0.60 

2 

157 

115.9 

1.35 

1.35 

1.35 

1.35 

1.7 

3 

30 

36.8 

0.815 

0.823 

0.816 

0.825 

l.G 

4 

13 

7.60 

1.71 

1.67 

1.63 

1.62 

1.6 

5 

7 

5.47 

1.28 

1.27 

1.22 

1.23 

1.8 

6 

7 

1.69 

4.14 

3.35 

3.38 

3.51 

0.96 

7 
8 

0 
0 

1.12 
0.55 

0.00 
0.00 

0.411 
0.559 

0.47 
0.59 

0.55 
(0.50) 
0.78 
(0.65) 

1.0 
(0.83) 
1.6 
(0.83) 
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The  gamraa-PEB  and  the  present  SSP-RSP  methodologies  give 
rather  comparable  results  for  the  first  six  categories.   The 
last  two  categories  differ  more  strikingly,  with  the  SSP-RSP 
procedure  shrinking  somewhat  more  extensively  than  the  gamma, 
the  RSP  weights,  especially  that  for  category  8,  are  surprisingly 
high;  this  is  believed  to  be  the  result  of  the  necessity  of 
approximating  the  assessment  of  discrepancy  on  the  log-scale  for 
s7  =  sR  =  0.   If  the  experiences  for  categories  7  and  8  are  pooled 
in  order  to  compute  weights,  then  the  perhaps  more  acceptable 
numbers  in  parentheses  result.   Although  the  weights  placed  on 
the  apparently  discrepant  rates  for  categories  1,  7,  and  8  are 
not  as  striking  as  might  be  wished,  they  are  of  interest.   It 
must  be  recognized  that  J  =  8  is  a  very  small  group  or  "sample." 
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